Event-driven architectures

  1. "Flash: an Efficient and Portable Web Server" vs Apache

Which model is better

When compare two different solutions, depends on metrics and who care about them.

Are t hreads useful?

Threads are useful because

  1. parallelization => speed up
  2. specialization => hot cache
  3. efficiency => lower memory requirement and cheaper synchronization

Threads hide latency of I/O operations (single CPU)

  1. depends on metrics
  2. depends on workload
  1. Different number of toy orders => different implementation for toy shop
  2. Different type of graph => different shortest path algoritham
  3. Different file patterns => different file system

What's useful

  1. For a matrix multiply application: execution time
  2. for a web service applciation:
    1. number of client requests/time
    2. response time
    3. average, min, max, 95%
  3. for hardware: higher utilization

Depends on metrics

Common metrics

  1. Execution time
  2. Throughput
  3. Response time
  4. Request rate
  5. Utilization
  6. Wait time
  7. Platform efficiency
  8. Performance/$
  9. Performance/walt
  10. Percentage of SLA (Service Level Agreement) violations
  11. Client perceived performance
  12. Aggregate performance (average of the past)
  13. Average resource useage.
  14. ....

Performance metrics: a measurement standard

  1. measurable and/or quantifiable property
  2. of the system we are interested in
  3. that ca nbe used to evaluate the system behavior
  4. obtained from (testbed)
    1. experiments with real software deployment, real machines, real workloads
    2. 'toy' experiments representative of realistice settings
    3. simulation
  5. in order to perform the evaluation and comparison, we should explore the values of the metrics over some meaningful range of parameters. Like workload, allocated resources, etc.

How to best provide concurrency

multt process vs multi threads

Web server: concurrent processing of client requests

  1. client/brower send request
  2. web server accepts request
  3. server procssing steps (some are more computational intensive, require interactions with network or the disk, they may be blocked depended on the status of the system) image.png
  4. respond by sending file

Multi process (MP)

  1. simple programming: many processes
  2. high memory usage
  3. costly context switch
  4. hard/costly to maintain shared state
  5. tricky port setup, send to same address, share the same socket and port

Multi Threaded (MT)

  1. Every thread performance accept connection through send data
  2. Boss/worker

Pros:

  1. Shared address space
  2. Shared state
  3. Cheap context swtich Cons:
  4. not simple implementation
  5. requires synchronization
  6. underlying OS support for threads.

Event-Driven Model

  1. Single address space
  2. Single process
  3. Single thread of control

image.png

  1. Dispatcher == state machine, can accept all types of the external event notifications and invoke the propreate handlers based on the type of notification.
  2. call handler == jump to code of the handler

Handler

  1. run to completion
  2. if they need to block: initiate blocking operation and pas control to dispatch loop.

Concurrent Execution in Event-Driven Model

In MP an dMT: one request per execution context (process/thread)

Event-driven: many requests interleaved in an execution context. A single thread switches among processing of different requests.

Why does this work

On 1 CPU "thread hide latency"

  1. if(t_idle>2* t_ctx_switch) then ctx_switch to hide latency.
  2. if(t_idle == 0) then contex switching just wasts cycles that could have been used for request processing

Event driven:

  1. Process request until wait necessary then switch to another request
  2. Even with multi CPUs, multi event-driven processes hand handle more requests than current CPUs.
  3. This has less overhead than context switching

How does this work?

  1. Sockets: interface of network
  2. Files: interface of disk
  3. File descritors: the actual data structure representing sockets and files is identical

which file descriptor?

  1. select(): a range of descriptors and return the first one that has some input on it.
  2. poll()
  3. both select and poll need to scan a large list of descriptors among which there are only very few have input. A lot of search time will be wasted.
  4. epoll()

Benefits of Event-driven model

  1. single address space
  2. single flow of control
  3. smaller memory requirement
  4. no context switching
  5. no synchronization

Problem with Event-Driven Model

A single blocking request/handler call can block the whole process.

Asynchronous I/O operations

  1. Process/thread makes system call
  2. OS obtains all relevant info from stack and either learns where to return results or tells caller where to get results later.
  3. process/thead can continue and later come back to check if the results are avaiable.
  4. Requires support from kernel (multi threads) and/or device (DMA)
  5. OS can use select(), poll(), or epoll() with Event-Driven Model
  6. Fits nicely with Event-Driven Model

What if Async Calls are not available?

Helpers:

  1. Designated for blocking I/O operations only
  2. can communicate with helps with pip/socket based communication with event dispatcher. This can check if the helpers provide any events to the dispatcher
  3. Async I/O call is handled by the helper. Helper blocks but main event loop (and procss) will not.
  4. Make the helpers processes: Asymmetric Multi-Process Event-Driven Model (AMPED)/(AMTED).

Pros.

  1. resolves portability limitation of basic event-driven model
  2. smaller footprint than regular worker thread. Cons:
  3. Applicability to certain classes of applications
  4. Event routing on multi CPU systems

image.png

Flash: Event-Driven Web Server

  1. an event-driven webserver (AMPED)
  2. with asymmetric helper processes to deal with blocking I/O operations
  3. helpers used for disk reads
  4. pipes used for comm with dispatcher
  5. helper reads file in memory (via mmap call)
  6. dispacther checks (via mincore) if pages of the file are in memory to decide 'local' handler or helper
  7. the check can be possible big savings coz files might have been in the memory. image.png

Flash: additional optimization.

  1. application-level caching
    1. data: files
    2. computation:
      1. Pathname trans. Cache (helper): look up directory data structure and cache the results
      2. Response Header Cache: http header for each file
  2. All the data structures are aligned for DMA operations
  3. Use of DMA with scatter-gather (header and actual data don't have to be aligned one next to another, they can be send from different memory locations, so copies are avoided) =>vector I/O operations
  4. All the features above are now fairly common optimizations but not at the time of the paper.

Apache

  1. Core: basic server skelethon, accepting connection and manage concurrency.
  2. Modules: different types of functionality executed on each request.
  3. Flow of control: similar to event-driven model. Every request go through all the modules
  4. Combination of MP & MT
  5. Each process/instance is a boss/worker with dynamic thread pool (configurable threadhold to increase/decrease the number of threads in the pool)
  6. Number of oroceeses can also be dynamically adjusted.

Setting Up Performance Comparison

  1. Comparison points (what systems?)
    1. MP(each process gingle thread)
    2. MT(Boss-worker)
    3. Single Process Event-Driven (SPED)
    4. Zeus (SPED with 2 processes to deal with I/O blocking situation)
    5. Apache (v1.3.1, MP)
    6. For all but Apache, optimizations introduced by Flash paper are implimented
    7. Compare againsts Flash (AMPED model)
  2. Inputs/workloads

    1. Realistic request workload: distribution of web page accesses over time
    2. Controlled, reproducible workload: trace-based (from real web servers)
    3. CS web server trace (Rice Univ): a large number of files which don't fit in the memory
    4. Owlnet trace (Rice Univ): student webpages, small files
    5. Synthetic workload generator (best/worst situation, what-if, ...)
  3. Metrics

    1. Bandwith == bytes/time: total bytes transfered from files/total time
    2. Connection rate == Request/time: total client conn/total time
    3. Evalauted both a a function of file size
      1. larger file size
        1. ammortize per connection cost => higher bandwidth
        2. more work per connection => lower connection rate

Best Case Numbers/Single File Trace

Synthetic load:

  1. vary the number of requests for same file => best case Measure Bandwidth:
  2. $Bandwitdh = n*bytes(F)/time$
  3. File size: 0-200kb: vary work per request

Obervations

  1. All exhibit similar results
  2. SPED has best performance
  3. Flash AMPED extra check for memory presence but there is no need to block I/O
  4. Zeus has anomaly, due to misalign to DMA operations.
  5. MT/MP extra sync & context switching
  6. Apache: lacks optimizations

image.png

Owlnet Trace/Small trace

  1. Trends similar to "best" case
  2. Small trace, mostly fits in cache
  3. Sometimes blocking I/O is required
    1. SPED will block
    2. Flash's helpers resolve the problem

CS Trace/Large Trace

  1. larger trace mostly requires I/O
  2. SPED worst: lack of async I/O
  3. MT better than MP
    1. smaller memory footprint => more memory to cache file and less I/O
    2. cheaper (faster) sync
  4. Flash best
    1. smaller memory footprint
    2. more memory for caching
    3. fewer requests lead to blocking I/O
    4. no sync needed

image.png

Impact of Optimizations

  1. Optimizations are important
  2. Apache would have benefited too image.png

Summary of performance results

  1. When data is in cache:
    1. SPED >> AMPED Flash: unncecessary test for memory presence
    2. SPED & AMPED Flash >> MT/MP: Sync & context switching overhead
  2. With disk-bound workload
    1. AMPED Flash >> SPED: blocks b/c no async I/O
    2. AMPED Flash >> MT/MP: more memory efficient and less context switching
  3. Disadvantage of FLash: not for every applciation such as multi CPU/software architecture

image.png

Design Relevant Experiments

  1. Relevant experiments: statements about a solution, that others believe in and care for

Purpose of relevant experiments:

Example: web server experiemnt

  1. clients: response time
  2. Operators: throughput

Possible goals:

  1. -response time, +throughput: great
  2. -response time: OK
  3. -response time, -throughput: may be usefule
  4. maintian response time when request rate increases Goals: metrics & configuration of experiments

Picking the right metrics

"Rule of thumb" for Picking Metrics:

  1. "Standard" metrics: broader audience
  2. Metrics answering the "why? what? who?)" questions: why I do, what to improve or udnerstand, who cares for this
    1. client performance:
      1. response time
      2. number of timedout request
    2. operator costs:
      1. throughput
      2. costs

Pick the right configuration space

  1. System Resources:
    1. Hardware (CPU, memory, etc.)
    2. Software (Number of threads, queue sizes, etc.)
  2. Wrokload
    1. Webserver
      1. request rate
      2. nubmer of concurrent requests
      3. file size
      4. access pattern 1 Pick
    2. choose a subset of configuration parameters which are likely most impactful when you change the metrics when you are observing
    3. pick ranges for each variable factor, the range must be relevant
    4. pick relevant workload
    5. include best/worst case scenarios, which demostrate some limitations/opportunities
    6. pick useful combinations of factors
      1. many just reiterate the same point
    7. Compare apples to apples
      1. poor example:
        1. large workload, small resources size
        2. small workload, large resources size
        3. Conclusion: performace imporoves when increase resources (wrong)
  3. Competition/Baseline
    1. State-of-the-art
    2. Most common practice
    3. ideal best/worst case scenarios
  4. run test cases n times
  5. compute metrics (average)
  6. represent results
  7. Conclusion

image.png

In [ ]: